Conversation
- Add rarfile dependency to pmultiqc_service for extracting RAR archives - Update download_pride_file to handle .rar file extraction - Update filter_search_files to include RAR archives in search results - Update FragPipe example in UI from PXD066146 to PXD062399 (DDA.rar)
Update FragPipe example in PRIDE service
📝 WalkthroughWalkthroughThe changes introduce RAR archive extraction support to the MultiQC service by adding the rarfile dependency and implementing RAR detection and extraction logic in the PRIDE download workflow. Additionally, example dataset entries in the UI are updated with new accession numbers. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@pmultiqc_service/app.py`:
- Around line 962-971: The RAR extraction branch (the elif checking
filename.lower().endswith(".rar") that opens rarfile.RarFile) lacks the safety
checks implemented in validate_and_extract_zip; update this branch to validate
each archive entry for path traversal (reject entries with absolute paths or
'..' components), enforce a max file count limit and an archive-bomb
compression-ratio check when extracting entries, and catch rarfile.BadRarFile
and other extraction errors to log and fail gracefully; preserve behavior of
removing the original .rar and setting final_file_path to download_dir only on
successful, validated extraction.
In `@pmultiqc_service/requirements.txt`:
- Around line 15-16: The requirements list includes the Python wrapper package
"rarfile" which depends on an external RAR extractor
(unrar/unar/7z/p7zip/bsdtar); update your deployment configuration (e.g.,
Dockerfile or provisioning scripts) to install one of these system tools
(install package like unrar or p7zip-full) and add a note to the project
README/deployment docs describing this runtime dependency, and also pin
"rarfile" to a minimum compatible version in requirements.txt (e.g., change
"rarfile" to a bounded spec like "rarfile>=X.Y") to match the existing version
pinning practice.
| elif filename.lower().endswith(".rar"): | ||
| logger.info(f"Extracting RAR file: {filename}") | ||
| with rarfile.RarFile(file_path, "r") as rar_ref: | ||
| # Extract to the same directory | ||
| rar_ref.extractall(download_dir) | ||
| # Remove the rar file | ||
| os.remove(file_path) | ||
| # Return the directory path since rar files contain multiple files | ||
| final_file_path = download_dir | ||
| logger.info(f"Extracted {filename} to directory") |
There was a problem hiding this comment.
Add security validation for RAR extraction similar to ZIP validation.
The RAR extraction lacks the security checks that validate_and_extract_zip (lines 1691-1740) implements for user uploads:
- Path traversal validation on entry names
- Archive bomb detection (compression ratio check)
- File count limits
While PRIDE datasets are presumably trusted, defense-in-depth is recommended. Consider also adding error handling for rarfile.BadRarFile exceptions.
🔒 Suggested security-hardened implementation
elif filename.lower().endswith(".rar"):
logger.info(f"Extracting RAR file: {filename}")
- with rarfile.RarFile(file_path, "r") as rar_ref:
- # Extract to the same directory
- rar_ref.extractall(download_dir)
- # Remove the rar file
- os.remove(file_path)
- # Return the directory path since rar files contain multiple files
- final_file_path = download_dir
- logger.info(f"Extracted {filename} to directory")
+ try:
+ with rarfile.RarFile(file_path, "r") as rar_ref:
+ # Security: Check for path traversal in RAR entry names
+ for entry in rar_ref.infolist():
+ if ".." in entry.filename or entry.filename.startswith("/"):
+ raise Exception(f"RAR file contains invalid path: {entry.filename}")
+ # Extract to the same directory
+ rar_ref.extractall(download_dir)
+ # Remove the rar file
+ os.remove(file_path)
+ # Return the directory path since rar files contain multiple files
+ final_file_path = download_dir
+ logger.info(f"Extracted {filename} to directory")
+ except rarfile.BadRarFile as e:
+ logger.error(f"Invalid RAR file {filename}: {e}")
+ raise Exception(f"Invalid RAR file: {filename}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif filename.lower().endswith(".rar"): | |
| logger.info(f"Extracting RAR file: {filename}") | |
| with rarfile.RarFile(file_path, "r") as rar_ref: | |
| # Extract to the same directory | |
| rar_ref.extractall(download_dir) | |
| # Remove the rar file | |
| os.remove(file_path) | |
| # Return the directory path since rar files contain multiple files | |
| final_file_path = download_dir | |
| logger.info(f"Extracted {filename} to directory") | |
| elif filename.lower().endswith(".rar"): | |
| logger.info(f"Extracting RAR file: {filename}") | |
| try: | |
| with rarfile.RarFile(file_path, "r") as rar_ref: | |
| # Security: Check for path traversal in RAR entry names | |
| for entry in rar_ref.infolist(): | |
| if ".." in entry.filename or entry.filename.startswith("/"): | |
| raise Exception(f"RAR file contains invalid path: {entry.filename}") | |
| # Extract to the same directory | |
| rar_ref.extractall(download_dir) | |
| # Remove the rar file | |
| os.remove(file_path) | |
| # Return the directory path since rar files contain multiple files | |
| final_file_path = download_dir | |
| logger.info(f"Extracted {filename} to directory") | |
| except rarfile.BadRarFile as e: | |
| logger.error(f"Invalid RAR file {filename}: {e}") | |
| raise Exception(f"Invalid RAR file: {filename}") |
🤖 Prompt for AI Agents
In `@pmultiqc_service/app.py` around lines 962 - 971, The RAR extraction branch
(the elif checking filename.lower().endswith(".rar") that opens rarfile.RarFile)
lacks the safety checks implemented in validate_and_extract_zip; update this
branch to validate each archive entry for path traversal (reject entries with
absolute paths or '..' components), enforce a max file count limit and an
archive-bomb compression-ratio check when extracting entries, and catch
rarfile.BadRarFile and other extraction errors to log and fail gracefully;
preserve behavior of removing the original .rar and setting final_file_path to
download_dir only on successful, validated extraction.
| tuspyserver>=4.2.0 | ||
| rarfile No newline at end of file |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if rarfile is documented in Dockerfile or deployment configs
fd -e Dockerfile -e dockerfile -e docker-compose -e docker-compose.yml -e "*.yaml" -e "*.yml" | head -20Repository: bigbio/pmultiqc
Length of output: 41
🌐 Web query:
Python rarfile package system requirements external dependencies
💡 Result:
Short summary — rarfile system requirements and external dependencies:
- Python: requires Python 3.6+ (use 3.8+ recommended). [1][3]
- External decompression programs (rarfile calls an external tool to extract compressed files):
- Preferred: unrar (official non‑free UnRAR binary) — full RAR3/RAR5 support. [1][3]
- Alternatives: unar, 7z/p7zip (may need p7zip-rar), bsdtar/libarchive (limited RAR support). Behavior/feature support varies by backend. [1][3]
- Crypto for password‑protected headers: optional Python crypto libraries such as cryptography or PyCrypto/PyCryptodome. [2]
- Platform notes: some backends are unavailable or limited on certain OSes (e.g., unar stdout support on Windows is poor; bsdtar has limited RAR feature support). See backend table for details. [1]
Sources: rarfile documentation and project page; Debian package FAQ. [1][2][3]
Add external RAR extraction tool to deployment configuration.
The rarfile package is a Python wrapper that requires an external RAR extraction program to function (unrar, unar, 7z/p7zip, or bsdtar). These tools must be installed in the deployment environment—for example, in the Docker image if used for containerized deployment.
Ensure the appropriate tool is installed in your deployment configuration and document this dependency. Additionally, consider pinning a minimum version for rarfile to match the version pinning practice used for other dependencies in this file.
🤖 Prompt for AI Agents
In `@pmultiqc_service/requirements.txt` around lines 15 - 16, The requirements
list includes the Python wrapper package "rarfile" which depends on an external
RAR extractor (unrar/unar/7z/p7zip/bsdtar); update your deployment configuration
(e.g., Dockerfile or provisioning scripts) to install one of these system tools
(install package like unrar or p7zip-full) and add a note to the project
README/deployment docs describing this runtime dependency, and also pin
"rarfile" to a minimum compatible version in requirements.txt (e.g., change
"rarfile" to a bounded spec like "rarfile>=X.Y") to match the existing version
pinning practice.
Pull Request
Description
Brief description of the changes made in this PR.
Type of Change
Summary by CodeRabbit
New Features
Updates
✏️ Tip: You can customize this high-level summary in your review settings.